Goto

Collaborating Authors

 express uncertainty


I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Neural Information Processing Systems

Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we propose a novel calibration method that can be used to combat hallucinations. We add a special [IDK] ("I Don't Know") token to the model's vocabulary and introduce an objective function that shifts probability mass to the [IDK] token for incorrect predictions. This approach allows the model to express uncertainty in its output explicitly. We evaluate our proposed method across multiple model architectures and factual downstream tasks.We find that models trained with our method are able to express uncertainty in places where they would previously make mistakes while suffering only a small loss of encoded knowledge. We further perform extensive ablation studies of multiple variations of our approach and provide a detailed analysis of the precision-recall tradeoff of our method.


UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation

Yang, Ruihan, Zhang, Caiqi, Zhang, Zhisong, Huang, Xinting, Yu, Dong, Collier, Nigel, Yang, Deqing

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are prone to hallucination, particularly in long-form generations. A promising direction to mitigate hallucination is to teach LLMs to express uncertainty explicitly when they lack sufficient knowledge. However, existing work lacks direct and fair evaluation of LLMs' ability to express uncertainty effectively in long-form generation. To address this gap, we first introduce UNCLE, a benchmark designed to evaluate uncertainty expression in both long- and short-form question answering (QA). UNCLE covers five domains and includes more than 1,000 entities, each with paired short- and long-form QA items. Our dataset is the first to directly link short- and long-form QA through aligned questions and gold-standard answers. Along with UNCLE, we propose a suite of new metrics to assess the models' capabilities to selectively express uncertainty. We then demonstrate that current models fail to convey uncertainty appropriately in long-form generation. We further explore both prompt-based and training-based methods to improve models' performance, with the training-based methods yielding greater gains. Further analysis of alignment gaps between short- and long-form uncertainty expression highlights promising directions for future research using UNCLE.


I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Neural Information Processing Systems

Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we propose a novel calibration method that can be used to combat hallucinations. We add a special [IDK] ("I Don't Know") token to the model's vocabulary and introduce an objective function that shifts probability mass to the [IDK] token for incorrect predictions. This approach allows the model to express uncertainty in its output explicitly.


LoGU: Long-form Generation with Uncertainty Expressions

Yang, Ruihan, Zhang, Caiqi, Zhang, Zhisong, Huang, Xinting, Yang, Sen, Collier, Nigel, Yu, Dong, Yang, Deqing

arXiv.org Artificial Intelligence

While Large Language Models (LLMs) demonstrate impressive capabilities, they still struggle with generating factually incorrect content (i.e., hallucinations). A promising approach to mitigate this issue is enabling models to express uncertainty when unsure. Previous research on uncertainty modeling has primarily focused on short-form QA, but realworld applications often require much longer responses. In this work, we introduce the task of Long-form Generation with Uncertainty(LoGU). We identify two key challenges: Uncertainty Suppression, where models hesitate to express uncertainty, and Uncertainty Misalignment, where models convey uncertainty inaccurately. To tackle these challenges, we propose a refinement-based data collection framework and a two-stage training pipeline. Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims. The collected data are then used in training through supervised fine-tuning (SFT) and direct preference optimization (DPO) to enhance uncertainty expression. Extensive experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.


Prudent Silence or Foolish Babble? Examining Large Language Models' Responses to the Unknown

Liu, Genglin, Wang, Xingyao, Yuan, Lifan, Chen, Yangyi, Peng, Hao

arXiv.org Artificial Intelligence

Large Language Models (LLMs) often struggle when faced with situations where they lack the prerequisite knowledge to generate a sensical response. In these cases, models tend to fabricate and hallucinate, rather than appropriately signaling uncertainty as humans would. This behavior misaligns with human conversational norms and presents challenges surrounding responsible and ethical AI development. This work aims to systematically investigate LLMs' behaviors in such situations. We curate an adversarial question-answering benchmark containing unanswerable questions targeting information absent from the LLM's training data. Concretely, these unanswerable questions contain non-existent concepts or false premises. When presented with such unanswerable questions, an LLM should appropriately convey uncertainty, and be able to challenge the premise and refuse to generate a response. While facing answerable valid questions, a model should demonstrate a positive correlation between accuracy and confidence. Using a model-agnostic unified confidence elicitation approach, we observe that LLMs that have gone through instruction finetuning and reinforcement learning from human feedback (RLHF) perform significantly better than their counterparts that do not. Moreover, uncertainty expression 1 through our elicitation method does not always stay consistent with the perceived confidence of the direct response of an LLM. Our findings call for further research into teaching LLMs to proactively and reliably express uncertainty.


Researchers help AI express uncertainty to improve health monitoring tech

#artificialintelligence

A team of engineering and health researchers has developed a tool that improves the ability of electronic devices to detect when a human patient is coughing, which has applications in health monitoring. The new tool relies on an advanced artificial intelligence (AI) algorithm that helps the AI better identify uncertainty when faced with unexpected data in real-world situations. The paper, "Robust Cough Detection with Out-of-Distribution Detection," is published in the IEEE Journal of Biomedical and Health Informatics. "When AI is being trained to identify the sound of coughing, this is usually done with'clean' data--there is not a lot of background noise or confusing sounds," says Edgar Lobaton, corresponding author of a paper on the work and an associate professor of electrical and computer engineering at North Carolina State University. "But the real world is full of background noise and confusing sounds. So previous cough detection technologies often struggled with'false positives'--they would say that someone was coughing even if nobody was coughing. "We've developed an algorithm that helps us address this problem by allowing an AI to express uncertainty.


Understanding the Uncertainty Loop of Human-Robot Interaction

Leusmann, Jan, Wang, Chao, Gienger, Michael, Schmidt, Albrecht, Mayer, Sven

arXiv.org Artificial Intelligence

Recently the field of Human-Robot Interaction gained popularity, due to the wide range of possibilities of how robots can support humans during daily tasks. One form of supportive robots are socially assistive robots which are specifically built for communicating with humans, e.g., as service robots or personal companions. As they understand humans through artificial intelligence, these robots will at some point make wrong assumptions about the humans' current state and give an unexpected response. In human-human conversations, unexpected responses happen frequently. However, it is currently unclear how such robots should act if they understand that the human did not expect their response, or even showing the uncertainty of their response in the first place. For this, we explore the different forms of potential uncertainties during human-robot conversations and how humanoids can, through verbal and non-verbal cues, communicate these uncertainties.